Conservative stemming for search and indexing

نویسندگان

  • Marie-Claire Jenkins
  • Dan Smith
چکیده

In this paper, we describe a stemmer which is designed to stem conservatively to orthographically correct word forms and recognizing words which do not need to be stemmed, such as proper nouns. We compare the performance of our stemmer with several other stemmers and propose further work to make this stemmer more effective for information retrieval, topic detection, and other linguistic applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multl-Language Text Indexing for Internet Retrieval

We address here the issues associated with indexing multilingual collections of information, as is found for example on the internet. We examine in particular the task of language identiication and the use of stemming algorithms for several European languages. We also present the lessons we have learned from our experience in using the SPIDER information retrieval system as a search engine over...

متن کامل

A Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine

Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...

متن کامل

Information Retrieval Effectiveness of Turkish Search Engines

This is an investigation of information retrieval performance of Turkish search engines with respect to precision, normalized recall, coverage and novelty ratios. We defined seventeen query topics for Arabul, Arama, Netbul and Superonline. These queries were carefully selected to assess the capability of a search engine for handling broad or narrow topic subjects, exclusion of particular inform...

متن کامل

On Effective Conceptual Indexing and Similarity Search in Text Data

Similarity search in text has proven to be an interesting problem from the qualitative perspective because of inherent redundancies and ambiguities in textual descriptions. The methods used in search engines in order to retrieve documents most similar to user-defined sets of keywords are not applicable to targets which are medium to large size documents, because of even greater noise effects st...

متن کامل

Context based Indexing in Information Retrieval System using BST

Searching of data relevant to our query is done by information retrieval system. Keyword searching is the basic idea of this system which tries to solve the large search space problem as the documents to be searched could be of any length. This means time to search will increase with length of document. Search time will be reduced by reducing the search space. In this, we are constructing a met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005